Manual annotations of temporal bounds for object interactions (i.e. start andend times) are typical training input to recognition, localization anddetection algorithms. For three publicly available egocentric datasets, weuncover inconsistencies in ground truth temporal bounds within and acrossannotators and datasets. We systematically assess the robustness ofstate-of-the-art approaches to changes in labeled temporal bounds, for objectinteraction recognition. As boundaries are trespassed, a drop of up to 10% isobserved for both Improved Dense Trajectories and Two-Stream ConvolutionalNeural Network. We demonstrate that such disagreement stems from a limited understanding ofthe distinct phases of an action, and propose annotating based on the RubiconBoundaries, inspired by a similarly named cognitive model, for consistenttemporal bounds of object interactions. Evaluated on a public dataset, wereport a 4% increase in overall accuracy, and an increase in accuracy for 55%of classes when Rubicon Boundaries are used for temporal annotations.
展开▼